Overview

Dataset statistics

Number of variables20
Number of observations1551
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory785.3 B

Variable types

CAT10
NUM9
BOOL1

Reproduction

Analysis started2020-11-06 23:19:41.253851
Analysis finished2020-11-06 23:20:09.459704
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
can_id has a high cardinality: 1551 distinct values High cardinality
can_nam has a high cardinality: 1545 distinct values High cardinality
can_off_sta has a high cardinality: 57 distinct values High cardinality
can_cit has a high cardinality: 939 distinct values High cardinality
can_sta has a high cardinality: 56 distinct values High cardinality
cov_sta_dat has a high cardinality: 180 distinct values High cardinality
cov_end_dat has a high cardinality: 126 distinct values High cardinality
net_ope_exp is highly correlated with ind_con and 5 other fieldsHigh Correlation
ind_con is highly correlated with net_ope_exp and 5 other fieldsHigh Correlation
tot_con is highly correlated with ind_con and 5 other fieldsHigh Correlation
tot_dis is highly correlated with ind_con and 4 other fieldsHigh Correlation
net_con is highly correlated with ind_con and 2 other fieldsHigh Correlation
ope_exp is highly correlated with ind_con and 4 other fieldsHigh Correlation
tot_rec is highly correlated with ind_con and 4 other fieldsHigh Correlation
can_sta is highly correlated with can_off_staHigh Correlation
can_off_sta is highly correlated with can_staHigh Correlation
ind_con is highly skewed (γ1 = 23.73771695) Skewed
net_ope_exp is highly skewed (γ1 = 24.63124139) Skewed
tot_con is highly skewed (γ1 = 22.70877387) Skewed
tot_dis is highly skewed (γ1 = 22.03489511) Skewed
net_con is highly skewed (γ1 = 26.97269546) Skewed
ope_exp is highly skewed (γ1 = 22.40544929) Skewed
tot_rec is highly skewed (γ1 = 22.24268841) Skewed
can_off_dis has 274 (17.7%) zeros Zeros

Variables

can_id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count1551
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
S6HI00271
 
1
S0HI00126
 
1
P40003576
 
1
S0IN00095
 
1
H6RI01112
 
1
Other values (1546)
1546
ValueCountFrequency (%) 
S6HI00271 1 0.1%
 
S0HI00126 1 0.1%
 
P40003576 1 0.1%
 
S0IN00095 1 0.1%
 
H6RI01112 1 0.1%
 
H6MI01226 1 0.1%
 
H2MO08067 1 0.1%
 
H8CA41139 1 0.1%
 
S6CT05108 1 0.1%
 
H6FL18147 1 0.1%
 
Other values (1541) 1541 99.4%
 

Length

Max length9
Mean length9
Min length9
ValueCountFrequency (%) 
Uppercase_Letter 24 70.6%
 
Decimal_Number 10 29.4%
 
ValueCountFrequency (%) 
Latin 24 70.6%
 
Common 10 29.4%
 
ValueCountFrequency (%) 
ASCII 34 100.0%
 

can_nam
Categorical

HIGH CARDINALITY
UNIFORM
Distinct count1545
Unique (%)99.6%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
FLYNN, MICHAEL
 
2
SPOTORNO, FRANK
 
2
MARSHALL, ROBERT
 
2
PAUL, RAND
 
2
BURK, JOHN GUNTHER JR
 
2
Other values (1540)
1541
ValueCountFrequency (%) 
FLYNN, MICHAEL 2 0.1%
 
SPOTORNO, FRANK 2 0.1%
 
MARSHALL, ROBERT 2 0.1%
 
PAUL, RAND 2 0.1%
 
BURK, JOHN GUNTHER JR 2 0.1%
 
RUBIO, MARCO 2 0.1%
 
SCALISE, STEVE MR. 1 0.1%
 
KIRK, MARK STEVEN 1 0.1%
 
JOHNSON, BILL 1 0.1%
 
MCKINLEY, DAVID B. MR. 1 0.1%
 
Other values (1535) 1535 99.0%
 

Length

Max length36
Mean length17.31914894
Min length8
ValueCountFrequency (%) 
Uppercase_Letter 26 74.3%
 
Other_Punctuation 5 14.3%
 
Open_Punctuation 1 2.9%
 
Space_Separator 1 2.9%
 
Dash_Punctuation 1 2.9%
 
Close_Punctuation 1 2.9%
 
ValueCountFrequency (%) 
Latin 26 74.3%
 
Common 9 25.7%
 
ValueCountFrequency (%) 
ASCII 35 100.0%
 

can_off
Categorical

Distinct count3
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
H
1318
S
 
188
P
 
45
ValueCountFrequency (%) 
H 1318 85.0%
 
S 188 12.1%
 
P 45 2.9%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 3 100.0%
 
ValueCountFrequency (%) 
Latin 3 100.0%
 
ValueCountFrequency (%) 
ASCII 3 100.0%
 

can_off_sta
Categorical

HIGH CARDINALITY
HIGH CORRELATION
Distinct count57
Unique (%)3.7%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
CA
 
164
FL
 
144
TX
 
100
NY
 
85
NC
 
63
Other values (52)
995
ValueCountFrequency (%) 
CA 164 10.6%
 
FL 144 9.3%
 
TX 100 6.4%
 
NY 85 5.5%
 
NC 63 4.1%
 
IL 62 4.0%
 
PA 56 3.6%
 
MD 50 3.2%
 
US 45 2.9%
 
OH 43 2.8%
 
Other values (47) 739 47.6%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 24 100.0%
 
ValueCountFrequency (%) 
Latin 24 100.0%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

can_off_dis
Real number (ℝ≥0)

ZEROS
Distinct count54
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.414571244
Minimum0
Maximum53
Zeros274
Zeros (%)17.7%
Memory size12.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median5
Q311
95-th percentile31
Maximum53
Range53
Interquartile range (IQR)10

Descriptive statistics

Standard deviation10.24022526
Coefficient of variation (CV)1.216963403
Kurtosis4.338587712
Mean8.414571244
Median Absolute Deviation (MAD)7.334474836
Skewness2.014306308
Sum13051
Variance104.8622133
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 3.5 8.5 13.5 19.5 27.5 53. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 274 17.7%
 
1 140 9.0%
 
2 112 7.2%
 
3 112 7.2%
 
4 94 6.1%
 
8 90 5.8%
 
5 85 5.5%
 
6 76 4.9%
 
7 66 4.3%
 
9 53 3.4%
 
Other values (44) 449 28.9%
 
ValueCountFrequency (%) 
0 274 17.7%
 
1 140 9.0%
 
2 112 7.2%
 
3 112 7.2%
 
4 94 6.1%
 
ValueCountFrequency (%) 
53 3 0.2%
 
52 5 0.3%
 
51 2 0.1%
 
50 2 0.1%
 
49 2 0.1%
 

can_par_aff
Categorical

Distinct count18
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
REP
789
DEM
648
IND
 
38
LIB
 
24
GRE
 
11
Other values (13)
 
41
ValueCountFrequency (%) 
REP 789 50.9%
 
DEM 648 41.8%
 
IND 38 2.5%
 
LIB 24 1.5%
 
GRE 11 0.7%
 
OTH 9 0.6%
 
UNK 6 0.4%
 
NNE 5 0.3%
 
NPA 5 0.3%
 
DFL 4 0.3%
 
Other values (8) 12 0.8%
 

Length

Max length3
Mean length2.996776273
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 19 100.0%
 
ValueCountFrequency (%) 
Latin 19 100.0%
 
ValueCountFrequency (%) 
ASCII 19 100.0%
 
Distinct count3
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
CHALLENGER
735
INCUMBENT
419
OPEN
397
ValueCountFrequency (%) 
CHALLENGER 735 47.4%
 
INCUMBENT 419 27.0%
 
OPEN 397 25.6%
 

Length

Max length10
Mean length8.194068343
Min length4
ValueCountFrequency (%) 
Uppercase_Letter 15 100.0%
 
ValueCountFrequency (%) 
Latin 15 100.0%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

can_cit
Categorical

HIGH CARDINALITY
Distinct count939
Unique (%)60.5%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
LAS VEGAS
 
19
NEW YORK
 
16
LOS ANGELES
 
14
CHICAGO
 
14
MIAMI
 
13
Other values (934)
1475
ValueCountFrequency (%) 
LAS VEGAS 19 1.2%
 
NEW YORK 16 1.0%
 
LOS ANGELES 14 0.9%
 
CHICAGO 14 0.9%
 
MIAMI 13 0.8%
 
HOUSTON 12 0.8%
 
INDIANAPOLIS 11 0.7%
 
WASHINGTON 11 0.7%
 
ORLANDO 11 0.7%
 
JACKSONVILLE 10 0.6%
 
Other values (929) 1420 91.6%
 

Length

Max length20
Mean length8.916827853
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 26 86.7%
 
Other_Punctuation 2 6.7%
 
Space_Separator 1 3.3%
 
Dash_Punctuation 1 3.3%
 
ValueCountFrequency (%) 
Latin 26 86.7%
 
Common 4 13.3%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

can_sta
Categorical

HIGH CARDINALITY
HIGH CORRELATION
Distinct count56
Unique (%)3.6%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
CA
 
167
FL
 
149
TX
 
100
NY
 
85
IL
 
64
Other values (51)
986
ValueCountFrequency (%) 
CA 167 10.8%
 
FL 149 9.6%
 
TX 100 6.4%
 
NY 85 5.5%
 
IL 64 4.1%
 
NC 63 4.1%
 
PA 57 3.7%
 
MD 49 3.2%
 
OH 44 2.8%
 
VA 42 2.7%
 
Other values (46) 731 47.1%
 

Length

Max length2
Mean length2
Min length2
ValueCountFrequency (%) 
Uppercase_Letter 24 100.0%
 
ValueCountFrequency (%) 
Latin 24 100.0%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

can_zip
Real number (ℝ≥0)

Distinct count1419
Unique (%)91.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54554585.17
Minimum603
Maximum989449767
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum603
5-th percentile7751
Q128605.5
median53216
Q389139
95-th percentile535461697
Maximum989449767
Range989449164
Interquartile range (IQR)60533.5

Descriptive statistics

Standard deviation183220823.6
Coefficient of variation (CV)3.358486241
Kurtosis11.62751188
Mean54554585.17
Median Absolute Deviation (MAD)98165446.25
Skewness3.527330958
Sum8.461416159e+10
Variance3.356987022e+16
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[6.03000000e+02 1.00095000e+04 1.00385000e+04 1.17500000e+04 1.90175000e+04 ... 9.80080000e+04 9.81210000e+04 9.96775000e+04 1.85538082e+08 9.89449767e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
22314 4 0.3%
 
32801 4 0.3%
 
20910 3 0.2%
 
90017 3 0.2%
 
32174 3 0.2%
 
45373 3 0.2%
 
89074 3 0.2%
 
60540 3 0.2%
 
30263 3 0.2%
 
32853 3 0.2%
 
Other values (1409) 1519 97.9%
 
ValueCountFrequency (%) 
603 1 0.1%
 
680 1 0.1%
 
791 1 0.1%
 
841 1 0.1%
 
920 1 0.1%
 
ValueCountFrequency (%) 
989449767 1 0.1%
 
986420020 1 0.1%
 
985083041 1 0.1%
 
970311456 1 0.1%
 
959880984 1 0.1%
 

cov_sta_dat
Categorical

HIGH CARDINALITY
Distinct count180
Unique (%)11.6%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
1/1/2015
664
1/1/2016
234
7/1/2015
 
107
4/1/2016
 
103
10/1/2015
 
94
Other values (175)
349
ValueCountFrequency (%) 
1/1/2015 664 42.8%
 
1/1/2016 234 15.1%
 
7/1/2015 107 6.9%
 
4/1/2016 103 6.6%
 
10/1/2015 94 6.1%
 
4/1/2015 82 5.3%
 
7/1/2016 18 1.2%
 
12/1/2015 9 0.6%
 
9/1/2015 7 0.5%
 
6/1/2015 7 0.5%
 
Other values (170) 226 14.6%
 

Length

Max length10
Mean length8.203739523
Min length8
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

cov_end_dat
Categorical

HIGH CARDINALITY
Distinct count126
Unique (%)8.1%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
10/19/2016
862
9/30/2016
354
6/30/2016
 
73
3/31/2016
 
32
12/31/2015
 
17
Other values (121)
213
ValueCountFrequency (%) 
10/19/2016 862 55.6%
 
9/30/2016 354 22.8%
 
6/30/2016 73 4.7%
 
3/31/2016 32 2.1%
 
12/31/2015 17 1.1%
 
10/15/2016 11 0.7%
 
9/30/2015 10 0.6%
 
11/28/2016 9 0.6%
 
8/10/2016 7 0.5%
 
7/15/2016 6 0.4%
 
Other values (116) 170 11.0%
 

Length

Max length10
Mean length9.585428756
Min length8
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Other_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

ind_con
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1521
Unique (%)98.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean974037.4839
Minimum5
Maximum231831604.4
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum5
5-th percentile726
Q115537.44
median134465.05
Q3563404.17
95-th percentile2265976.015
Maximum231831604.4
Range231831599.4
Interquartile range (IQR)547866.73

Descriptive statistics

Standard deviation7354090.908
Coefficient of variation (CV)7.550110781
Kurtosis672.4994603
Mean974037.4839
Median Absolute Deviation (MAD)1315618.833
Skewness23.73771695
Sum1510732137
Variance5.408265309e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[5.00000000e+00 2.45000000e+02 1.12173500e+03 7.64450000e+03 1.89335000e+04 ... 1.44887120e+06 2.23481619e+06 3.80798662e+06 1.43565620e+07 2.31831604e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
150 4 0.3%
 
200 4 0.3%
 
950 3 0.2%
 
700 3 0.2%
 
50 3 0.2%
 
100 3 0.2%
 
500 3 0.2%
 
3550 2 0.1%
 
215 2 0.1%
 
1400 2 0.1%
 
Other values (1511) 1522 98.1%
 
ValueCountFrequency (%) 
5 1 0.1%
 
10 1 0.1%
 
15 1 0.1%
 
20 2 0.1%
 
25 1 0.1%
 
ValueCountFrequency (%) 
231831604.4 1 0.1%
 
105799882.7 1 0.1%
 
92036123.51 1 0.1%
 
63461402.63 1 0.1%
 
45362044.95 1 0.1%
 

net_ope_exp
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1542
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4384710.413
Minimum1.8
Maximum1954397343
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum1.8
5-th percentile3050.905
Q125807.36
median197032.64
Q3804576.485
95-th percentile3429628.83
Maximum1954397343
Range1954397342
Interquartile range (IQR)778769.125

Descriptive statistics

Standard deviation61660546.07
Coefficient of variation (CV)14.06262678
Kurtosis699.0083624
Mean4384710.413
Median Absolute Deviation (MAD)7453357.034
Skewness24.63124139
Sum6800685850
Variance3.802022941e+15
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.80000000e+00 1.60000000e+01 1.08685550e+04 2.00826150e+04 3.17612550e+04 ... 2.45639220e+06 3.89883309e+06 1.44552169e+07 1.26103406e+08 1.95439734e+09], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1915655.96 2 0.1%
 
6988 2 0.1%
 
77314.34 2 0.1%
 
8576.29 2 0.1%
 
5700 2 0.1%
 
83145.56 2 0.1%
 
212 2 0.1%
 
6000 2 0.1%
 
1150 2 0.1%
 
304961.94 1 0.1%
 
Other values (1532) 1532 98.8%
 
ValueCountFrequency (%) 
1.8 1 0.1%
 
4.59 1 0.1%
 
10 1 0.1%
 
11 1 0.1%
 
15 1 0.1%
 
ValueCountFrequency (%) 
1954397343 1 0.1%
 
923650997.6 1 0.1%
 
724308424.2 1 0.1%
 
673695919.4 1 0.1%
 
394899772.8 1 0.1%
 

tot_con
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1539
Unique (%)99.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1262476.528
Minimum10
Maximum231837226.4
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum10
5-th percentile1946.245
Q120956
median168808.53
Q31066731.79
95-th percentile3511399.245
Maximum231837226.4
Range231837216.4
Interquartile range (IQR)1045775.79

Descriptive statistics

Standard deviation7508667.972
Coefficient of variation (CV)5.947570357
Kurtosis624.5412706
Mean1262476.528
Median Absolute Deviation (MAD)1572373.082
Skewness22.70877387
Sum1958101094
Variance5.638009471e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000000e+01 6.97000000e+02 1.03680000e+04 3.20197650e+04 1.08367545e+05 ... 2.80897135e+06 4.75468138e+06 1.39049995e+07 2.02897909e+07 2.31837226e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
500 2 0.1%
 
750 2 0.1%
 
7300 2 0.1%
 
101148.61 2 0.1%
 
1851509.95 2 0.1%
 
3550 2 0.1%
 
10400 2 0.1%
 
600 2 0.1%
 
2700 2 0.1%
 
380 2 0.1%
 
Other values (1529) 1531 98.7%
 
ValueCountFrequency (%) 
10 1 0.1%
 
20 2 0.1%
 
30 1 0.1%
 
67 1 0.1%
 
84.83 1 0.1%
 
ValueCountFrequency (%) 
231837226.4 1 0.1%
 
114492189.3 1 0.1%
 
92137218.65 1 0.1%
 
63466990.92 1 0.1%
 
45818117.38 1 0.1%
 

tot_dis
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1542
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1348706.665
Minimum1.8
Maximum238962741.3
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum1.8
5-th percentile4151.03
Q129161.555
median239804.03
Q3966803.295
95-th percentile3548315.6
Maximum238962741.3
Range238962739.5
Interquartile range (IQR)937641.74

Descriptive statistics

Standard deviation9244104.991
Coefficient of variation (CV)6.854051536
Kurtosis539.3898085
Mean1348706.665
Median Absolute Deviation (MAD)1717476.814
Skewness22.03489511
Sum2091844038
Variance8.545347709e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.80000000e+00 1.64535000e+04 3.16909700e+04 8.11371150e+04 1.45140505e+05 ... 2.45932472e+06 4.30110120e+06 1.35942085e+07 2.22781545e+07 2.38962741e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
8576.29 2 0.1%
 
1150 2 0.1%
 
1970657.96 2 0.1%
 
482422.91 2 0.1%
 
46196 2 0.1%
 
83369.66 2 0.1%
 
20000 2 0.1%
 
77314.34 2 0.1%
 
212 2 0.1%
 
111269.03 1 0.1%
 
Other values (1532) 1532 98.8%
 
ValueCountFrequency (%) 
1.8 1 0.1%
 
15 1 0.1%
 
17 1 0.1%
 
18.92 1 0.1%
 
29.5 1 0.1%
 
ValueCountFrequency (%) 
238962741.3 1 0.1%
 
232031346.9 1 0.1%
 
93373187.27 1 0.1%
 
64258231.64 1 0.1%
 
51191082.94 1 0.1%
 

net_con
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1537
Unique (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4349405.511
Minimum10
Maximum2096279791
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum10
5-th percentile1780.5
Q120956.645
median167644.88
Q31063289.95
95-th percentile3558825.865
Maximum2096279791
Range2096279781
Interquartile range (IQR)1042333.305

Descriptive statistics

Standard deviation62633106.12
Coefficient of variation (CV)14.40038322
Kurtosis836.3497738
Mean4349405.511
Median Absolute Deviation (MAD)7228835.44
Skewness26.97269546
Sum6745927948
Variance3.922905983e+15
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000000e+01 2.77750000e+03 1.36545100e+04 3.23500000e+04 1.04157695e+05 ... 2.78668387e+06 4.64028406e+06 1.78534685e+07 1.32561365e+08 2.09627979e+09], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
380 2 0.1%
 
30 2 0.1%
 
450 2 0.1%
 
500 2 0.1%
 
2700 2 0.1%
 
101148.61 2 0.1%
 
1826909.95 2 0.1%
 
6500 2 0.1%
 
750 2 0.1%
 
1150 2 0.1%
 
Other values (1527) 1531 98.7%
 
ValueCountFrequency (%) 
10 1 0.1%
 
20 2 0.1%
 
30 2 0.1%
 
84.83 1 0.1%
 
85 1 0.1%
 
ValueCountFrequency (%) 
2096279791 1 0.1%
 
827930839.9 1 0.1%
 
715555724.8 1 0.1%
 
468441873.4 1 0.1%
 
410046441 1 0.1%
 

ope_exp
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1542
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1250837.711
Minimum1.8
Maximum238374891.3
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum1.8
5-th percentile3344.665
Q125716.385
median197156.1
Q3796958.075
95-th percentile3294936.93
Maximum238374891.3
Range238374889.5
Interquartile range (IQR)771241.69

Descriptive statistics

Standard deviation9064417.319
Coefficient of variation (CV)7.246677358
Kurtosis555.206951
Mean1250837.711
Median Absolute Deviation (MAD)1638259.716
Skewness22.40544929
Sum1940049290
Variance8.216366134e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.80000000e+00 1.60000000e+01 7.48344000e+03 2.00983000e+04 4.51750350e+04 ... 1.97415859e+06 3.91712380e+06 1.35092901e+07 2.19796380e+07 2.38374891e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
6000 2 0.1%
 
1150 2 0.1%
 
83369.66 2 0.1%
 
1916057.96 2 0.1%
 
77314.34 2 0.1%
 
218 2 0.1%
 
6988 2 0.1%
 
212 2 0.1%
 
8576.29 2 0.1%
 
1448434.4 1 0.1%
 
Other values (1532) 1532 98.8%
 
ValueCountFrequency (%) 
1.8 1 0.1%
 
4.59 1 0.1%
 
11 1 0.1%
 
15 1 0.1%
 
17 1 0.1%
 
ValueCountFrequency (%) 
238374891.3 1 0.1%
 
226685620.9 1 0.1%
 
87216086.77 1 0.1%
 
61598973.9 1 0.1%
 
46337371.79 1 0.1%
 

tot_rec
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
Distinct count1540
Unique (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1480109.543
Minimum10
Maximum254957194.9
Zeros0
Zeros (%)0.0%
Memory size12.2 KiB

Quantile statistics

Minimum10
5-th percentile4705.235
Q132032.765
median255237.7
Q31154644.285
95-th percentile3880867.285
Maximum254957194.9
Range254957184.9
Interquartile range (IQR)1122611.52

Descriptive statistics

Standard deviation9610770.654
Coefficient of variation (CV)6.493283354
Kurtosis550.395497
Mean1480109.543
Median Absolute Deviation (MAD)1837178.231
Skewness22.24268841
Sum2295649900
Variance9.236691257e+13
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.00000000e+01 1.86324700e+04 3.14512250e+04 7.94808350e+04 2.10186135e+05 ... 2.36960292e+06 3.90605949e+06 5.30635959e+06 2.09268335e+07 2.54957195e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
7220 2 0.1%
 
1150 2 0.1%
 
7875.97 2 0.1%
 
24710 2 0.1%
 
600 2 0.1%
 
1882950.05 2 0.1%
 
139424.1 2 0.1%
 
7300 2 0.1%
 
3550 2 0.1%
 
20000 2 0.1%
 
Other values (1530) 1531 98.7%
 
ValueCountFrequency (%) 
10 1 0.1%
 
20 1 0.1%
 
30 1 0.1%
 
67.02 1 0.1%
 
110 1 0.1%
 
ValueCountFrequency (%) 
254957194.9 1 0.1%
 
236804528.5 1 0.1%
 
93469852.86 1 0.1%
 
65063783.43 1 0.1%
 
48126114.3 1 0.1%
 

winner
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size12.2 KiB
N
1087
Y
464
ValueCountFrequency (%) 
N 1087 70.1%
 
Y 464 29.9%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

can_idcan_namcan_offcan_off_stacan_off_discan_par_affcan_inc_cha_ope_seacan_citcan_stacan_zipcov_sta_datcov_end_datind_connet_ope_exptot_contot_disnet_conope_exptot_recwinner
0H2GA12121ALLEN, RICHARD WHGA12.0REPINCUMBENTAUGUSTAGA30904.01/1/201510/19/2016601274.50907156.211074949.50978518.981074949.50908518.981094022.76Y
1H6PA02171EVANS, DWIGHTHPA2.0DEMCHALLENGERPHILADELPHIAPA19138.011/2/201510/19/20161114711.021298831.831417545.221313583.691406719.061300557.531419270.92Y
2H6FL04105RUTHERFORD, JOHNHFL4.0REPOPENJACKSONVILLEFL32224.04/1/201610/19/2016542105.38656210.29650855.38675642.76650855.38656642.76711287.85Y
3H4MT01041ZINKE, RYAN KHMT0.0REPINCUMBENTWHITEFISHMT599373010.01/1/201510/19/20164317331.585055942.154980915.415200630.004938943.745073110.335190887.78Y
4H8CA09060LEE, BARBARAHCA13.0DEMINCUMBENTOAKLANDCA94612.01/1/201510/19/2016897123.61949488.981205863.611112163.941197676.61953436.941209811.57Y
5H6NC04037PRICE, DAVID E.HNC4.0DEMINCUMBENTRALEIGHNC27602.01/1/201510/19/2016328804.52430826.04728854.52675837.98725854.52435688.13733716.61Y
6H2WI02124POCAN, MARKHWI2.0DEMINCUMBENTMADISONWI53701.01/1/201510/19/2016393873.83445438.15970547.37745903.44970385.04445465.15970574.37Y
7H2MA09072LYNCH, STEPHENHMA8.0DEMINCUMBENTSOUTH BOSTONMA2127.01/1/201510/19/2016767049.56459790.681092269.56493047.231092218.56464636.231097115.11Y
8H6OR02116WALDEN, GREGORY P MR.HOR2.0REPINCUMBENTHOOD RIVEROR970311456.01/1/201510/19/2016969437.031911215.543012350.642866919.773004650.641937694.043134128.86Y
9H2MA04073KENNEDY, JOSEPH P IIIHMA4.0DEMINCUMBENTNEWTONMA2459.01/1/201510/19/20161938192.381537844.392797967.381553016.802784362.261539411.683011173.93Y

Last rows

can_idcan_namcan_offcan_off_stacan_off_discan_par_affcan_inc_cha_ope_seacan_citcan_stacan_zipcov_sta_datcov_end_datind_connet_ope_exptot_contot_disnet_conope_exptot_recwinner
1541H6MI10185FLYNN, MICHAELHMI10.0REPOPENSHELBY TOWNSHIPMI48318.01/1/20159/30/201542250.0053693.2142250.0060693.2142250.0053693.2167252.81N
1542H6IL18153MELLON, ROBERTHIL18.0DEMOPENQUINCYIL62305.04/1/20159/30/201521538.0019007.0021538.0019007.0021538.0019007.0021538.00N
1543H6MS01180MILLS, MICHAEL P. JRHMS1.0REPOPENFULTONMS38843.01/1/20159/30/201598100.00178100.00101600.00181600.00101100.00178100.00181600.00N
1544H6MS01172PIRKLE, GREGORY D.HMS1.0REPOPENTUPELOMS38802.01/1/20159/30/2015213756.00459834.76213756.00462392.04212756.00460070.80462392.04N
1545H6CA46124SANCHEZ, HEBERTO MHCA46.0DEMOPENPOMONACA91766.06/10/20159/30/20153318.963111.943318.963318.963111.943111.943318.96N
1546P00004275BROWN, HARLEY DPUS0.0NNEOPENNAMPAID83686.01/1/20157/8/2015215.009655.0212847.3911683.8910487.1011683.8912847.39N
1547H6NY11182LANE, JAMESHNY11.0GREOPENBROOKLYNNY11215.01/1/20157/7/201512889.0013356.8914241.0013983.1114241.0013356.8914241.00N
1548H6MS01164COLLINS, NANCYHMS1.0REPOPENTUPELOMS38804.01/1/20157/1/201595538.35247121.35102538.35247121.35102538.35247121.35247121.35N
1549S6CA00618ALBERTSON, STEWARTSCA0.0DEMOPENREDWOOD CITYCA94065.01/1/20156/30/201518949.0015221.0020949.0030949.0015250.0015221.0030949.00N
1550H6MS01198JONES, ROGER STARNER DR.HMS1.0REPOPENPONTOTOCMS38863.01/1/20156/30/201525808.00528638.39140858.00538358.00140858.00528638.39538358.00N